Prior to the workshop, we recommend you start by:
Having the latest version of R installed (at least 4.3)
Having the latest version of RStudio installed
Installing the STexampleData,
imcdatasets, scHOT, spicyR,
simpleSeg and lisaClust packages Bioconductor
and associated dependencies
If you use use RStudio, we recommend viewing this document in visual mode.
The following chunk of code will install all of the R packages you
need for this workshop. Please ensure that you do not run it more than
once or remove eval=FALSE from the code chunk, you do not
want to be installing these packages multiple times.
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("STexampleData", "imcdatasets", "scHOT", "simpleSeg", "FuseSOM", "spicyR", "lisaClust", "ClassifyR", "ggplot2", "scater", "scuttle", "batchelor", "patchwork", "plotly", "RColorBrewer"))
If you are viewing this Rmd before the workshop, you might also like to download the data that we will use. This can take time.
imc <- imcdatasets::Damond_2019_Pancreas(data_type = "spe")
spe <- STexampleData::seqFISH_mouseEmbryo()
suppressPackageStartupMessages({
# Data packages
library(STexampleData)
library(imcdatasets)
# Packages from scdney
library(scHOT)
library(simpleSeg)
library(FuseSOM)
library(spicyR)
library(lisaClust)
library(ClassifyR)
# Extra packages needed for workshop
library(ggplot2)
library(scater)
library(scuttle)
library(batchelor)
library(patchwork)
library(plotly)
library(RColorBrewer)
})
# We can use the following to increase computational speed.
# If you feel confident in the amount of CPU cores and/or memory that you have
# access to, feel free to increase nCores.
nCores <- 1
BPPARAM <- simpleSeg:::generateBPParam(nCores)
# The following will improve the aesthetics of some of the plots that we will
# generate.
theme_set(theme_classic())
source("data/celltype_colours.R")
This is the main working document for the Unlocking single cell spatial omics analyses with scdney workshop. We recommend you download a local copy of the Rmd file and while running through the analysis scripts see what edits or additions you would make.
We will use two motivating datasets:
Lohoff et al,
2022: A seqFISH study of early mouse organogenesis. We will use a
subset of data that is made available from the STExampleData
package.
(Damond
et al, 2019): An Imaging Mass Cytometry (IMC) dataset profiling the
spatial landscape of pancreatic islets in subjects with long-duration
diabetes, recent onset diabetes and controls. The data is downloaded
using the imcdatasets
package. The key conclusion of this manuscript (amongst others) is that
spatial organisation of cells is indicative of diabetes progress. We
will examine this data and assess a similar question using the packages
in scdney.
Here we will download the datasets, examine the structure and perform some exploratory analyses. This might take a few moments and you may be prompted to install some additional packages.
Here we download the seqFISH mouse embryo data. This is a
SpatialExperiment object, which extends the
SingleCellExperiment object.
spe <- STexampleData::seqFISH_mouseEmbryo()
spe
## class: SpatialExperiment
## dim: 351 11026
## metadata(0):
## assays(2): counts molecules
## rownames(351): Abcc4 Acp5 ... Zfp57 Zic3
## rowData names(1): gene_name
## colnames(11026): embryo1_Pos0_cell10_z2 embryo1_Pos0_cell100_z2 ...
## embryo1_Pos28_cell97_z2 embryo1_Pos28_cell98_z2
## colData names(14): cell_id embryo ... segmentation_vertices sample_id
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : x y
## imgData names(0):
We can use functions designed for SingleCellExperiment
objects in the scater package for plotting via the
reducedDim slot. We multiply the spatial coordinates by a
matrix to flip the y-axis and ensure we fix the aspect ratio.
spe <- logNormCounts(spe)
coord_transform <- matrix(c(1,0,0,-1), 2, 2, byrow = TRUE)
reducedDim(spe, "spatialCoords") <- spatialCoords(spe) %*% coord_transform
plotReducedDim(spe, "spatialCoords", colour_by = c("Sox2"), point_size = 1) +
coord_fixed()
Questions
# try to answer the above question using the spe object.
# you may want to check the SingleCellExperiment vignette.
# https://bioconductor.org/packages/3.17/bioc/vignettes/SingleCellExperiment/inst/doc/intro.html
We can perform a typical gene-expression based analysis for this data. Later in part two we will perform some specific analytical techniques, but for now let’s explore the dataset and use methods designed for single cell data.
Dimensionality reduction using PCA, batch correction across tiles
using the batchelor package, followed by UMAP and
plotting.
spe <- runPCA(spe)
b.out <- batchelor::batchCorrect(spe, batch = spe$pos, assay.type = "logcounts", PARAM=FastMnnParam(d=20))
reducedDim(spe, "FastMnn") <- reducedDim(b.out, "corrected")
spe <- runUMAP(spe, dimred = "FastMnn")
spe
## class: SpatialExperiment
## dim: 351 11026
## metadata(0):
## assays(3): counts molecules logcounts
## rownames(351): Abcc4 Acp5 ... Zfp57 Zic3
## rowData names(1): gene_name
## colnames(11026): embryo1_Pos0_cell10_z2 embryo1_Pos0_cell100_z2 ...
## embryo1_Pos28_cell97_z2 embryo1_Pos28_cell98_z2
## colData names(15): cell_id embryo ... sample_id sizeFactor
## reducedDimNames(4): spatialCoords PCA FastMnn UMAP
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : x y
## imgData names(1): sample_id
g_celltype_umap <- plotReducedDim(spe, "UMAP", colour_by = "celltype_mapped_refined") +
scale_colour_manual(values = celltype_colours)
g_celltype_umap
plotReducedDim(spe, "UMAP", colour_by = "Sox2")
g_celltype_spatial <- plotReducedDim(spe, "spatialCoords", colour_by = "celltype_mapped_refined") +
scale_colour_manual(values = celltype_colours) +
coord_fixed()
g_all <- g_celltype_spatial + theme(legend.position = "none") + g_celltype_umap
g_all
Advanced/Extension Question
ggiraph
package for extending the g_all object to an interactive
plot with a tooltip that links the spatial and UMAP coordinate systems.
(Hint: This may involve generating a new ggplot object outside of the
plotReducedDim function.)# try to examine answer the above questions using the spe object.
# you may want to set up some small simulation..
At this point we will pause our examination of the seqFISH dataset
that is in the object spe, and turn over to the second
example dataset. In the second part we will revisit this data for
performing scHOT testing.
Here we download a subset of the IMC data from the
imcdatasets package. This is also a
SpatialExperiment object.
imc <- imcdatasets::Damond_2019_Pancreas(data_type = "spe")
imc
## class: SpatialExperiment
## dim: 38 252059
## metadata(0):
## assays(3): counts exprs quant_norm
## rownames(38): H3 SMA ... DNA1 DNA2
## rowData names(6): channel metal ... antibody_clone full_name
## colnames(252059): 138_1 138_2 ... 319_1149 319_1150
## colData names(27): cell_id image_name ... patient_BMI sample_id
## reducedDimNames(0):
## mainExpName: Damond_2019_Pancreas_v1
## altExpNames(0):
## spatialCoords names(2) : cell_x cell_y
## imgData names(0):
Questions
# try to answer the above question using the imc object.
# you may want to check the SingleCellExperiment vignette.
# https://www.bioconductor.org/packages/release/bioc/vignettes/SpatialExperiment/inst/doc/SpatialExperiment.html
To make things faster and less computationally demanding, we’ll subset the data down to 60 pancreatic islets from an equal number of non-diabetic, recent onset and long-duration diabetics.
# set the seed so the sampling is determined
set.seed(51773)
# sample 20 image names from the 3 conditions
useImages <- c(
sample(imc$image_name[imc$patient_stage == "Non-diabetic"], 20),
sample(imc$image_name[imc$patient_stage == "Onset"], 20),
sample(imc$image_name[imc$patient_stage == "Long-duration"], 20)
)
# subset the data with the images names we sampled
imc <- imc[, imc$image_name %in% useImages]
As our data is stored in a SpatialExperiment, as we did
previously, we can use scater to perform and visualise our
data in a lower dimensional embedding to look for image or cluster
differences.
set.seed(51773)
# Perform dimension reduction using UMAP.
imc <- scater::runUMAP(imc, exprs_values = "counts", name = "UMAP_raw")
# UMAP by imageID.
scater::plotReducedDim(imc, dimred = "UMAP_raw", colour_by = "image_name")
Questions
# try to answer the questions here!
We should check to see if the marker intensities of each cell require some form of transformation or normalisation to control for subtle systematic differences that may have emerged during measurement.
Here, we extract the intensities from the counts assay.
Looking at SMA, which should be expressed in the majority of the stromal
cells, we can see that the intensities are clearly very skewed and the
peaks of the lower intensities don’t overlap.
# Extract marker data and bind with information about images
df <- as.data.frame(cbind(colData(imc), t(assay(imc, "counts"))))
# Plots densities of PanKRT for each image.
ggplot(df, aes(x = sqrt(SMA), colour = image_name)) +
geom_density() +
theme(legend.position = "none") +
xlim(0,4) +
labs(x = "SMA", y = "Density by Image")
We can transform and normalise our data using the
normalizeCells function. Here we have taken the intensities
from the counts assay, performed an inverse hyperbolic sine
transform, then for each image trimmed the 99 quantile, scaled the means
to be equal and then removed the first principal component. This
modified data is then store in the norm assay by default.
We can see that this normalised data appears to align more, not perfect,
but likely sufficient for clustering.
# Transform and normalise the marker expression of each cell type.
# Use a square root transform, then trimmed the 99 quantile
imc <- simpleSeg::normalizeCells(imc,
transformation = "asinh",
method = c("trim99", "mean", "PC1"),
assayIn = "counts",
cores = nCores,
imageID = "image_name"
)
# Extract normalised marker information.
df <- as.data.frame(cbind(colData(imc), t(assay(imc, "norm"))))
# Plots densities of normalised PanKRT for each image.
ggplot(df, aes(x = SMA, colour = image_name)) +
geom_density() +
theme(legend.position = "none")
Questions
# try to answer the questions here!
At this point you should have a sense of the data structure and feel confident in generating visualisations and finding information on how to perform other kinds of explorations. Time for a few minutes to stand up and stretch and get ready for Part 2.
Now that are we are comfortable with the two datasets, let’s perform some analytical techniques that are specific to spatial omics data.
Here we will ask which gene patterns we observe to be changing across the spe$gutRegion cell type in space. Note that we want to assess the anatomical region corresponding to the anterior end of the developing gut developing brain so we will first subset the cells using the spatial coordinates. We can check what we have selected by plotting.
spe$gutRegion <- spe$celltype_mapped_refined == "Gut tube" &
reducedDim(spe, "spatialCoords")[,1] < -0.5
plotReducedDim(spe, "spatialCoords", colour_by = "gutRegion") +
coord_fixed() +
scale_colour_manual(values = c("TRUE" = "red", "FALSE" = "grey"))
Let’s subset the data to only these cells and continue with our scHOT analysis.
spe_gut <- spe[,spe$gutRegion]
spe_gut
## class: SpatialExperiment
## dim: 351 472
## metadata(0):
## assays(3): counts molecules logcounts
## rownames(351): Abcc4 Acp5 ... Zfp57 Zic3
## rowData names(1): gene_name
## colnames(472): embryo1_Pos3_cell377_z2 embryo1_Pos3_cell388_z2 ...
## embryo1_Pos27_cell74_z2 embryo1_Pos28_cell373_z2
## colData names(16): cell_id embryo ... sizeFactor gutRegion
## reducedDimNames(4): spatialCoords PCA FastMnn UMAP
## mainExpName: NULL
## altExpNames(0):
## spatialCoords names(2) : x y
## imgData names(1): sample_id
We select genes with at least some proportion of expressed cells for
testing, and create the scHOT object.
hist(rowMeans(counts(spe_gut)>0), 40)
gene_to_test <- as.matrix(c(rownames(spe_gut[rowMeans(counts(spe_gut)>0) > 0.2,])))
length(gene_to_test)
## [1] 165
rownames(gene_to_test) <- apply(gene_to_test, 1, paste0, collapse = "_")
head(gene_to_test)
## [,1]
## Acvr1 "Acvr1"
## Acvr2a "Acvr2a"
## Ahnak "Ahnak"
## Akr1c19 "Akr1c19"
## Aldh1a2 "Aldh1a2"
## Aldh2 "Aldh2"
scHOT_spatial <- scHOT_buildFromSCE(spe_gut,
assayName = "logcounts",
positionType = "spatial",
positionColData = c("x_global_affine", "y_global_affine"))
scHOT_spatial
## class: scHOT
## dim: 351 472
## metadata(0):
## assays(1): expression
## rownames(351): Abcc4 Acp5 ... Zfp57 Zic3
## rowData names(0):
## colnames(472): embryo1_Pos3_cell377_z2 embryo1_Pos3_cell388_z2 ...
## embryo1_Pos27_cell74_z2 embryo1_Pos28_cell373_z2
## colData names(16): cell_id embryo ... sizeFactor gutRegion
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## testingScaffold dim: 0 0
## weightMatrix dim: 0 0
## scHOT_output colnames (0):
## param names (0):
## position type: spatial
We now add the testing scaffold to the scHOT object, and
set the local weight matrix for testing, with a choice of span of 0.1
(the proportion of cells to weight around each cell). We can speed up
computation by not requiring the weight matrix correspond to every
individual cell, but instead a random selection among all the cells
using the thin function.
scHOT_spatial <- scHOT_addTestingScaffold(scHOT_spatial, gene_to_test)
head(scHOT_spatial@testingScaffold)
## gene_1
## Acvr1 "Acvr1"
## Acvr2a "Acvr2a"
## Ahnak "Ahnak"
## Akr1c19 "Akr1c19"
## Aldh1a2 "Aldh1a2"
## Aldh2 "Aldh2"
scHOT_spatial <- scHOT_setWeightMatrix(scHOT_spatial, span = 0.2)
scHOT_spatial@weightMatrix <- thin(scHOT_spatial@weightMatrix, n = 50)
dim(slot(scHOT_spatial, "weightMatrix"))
## [1] 53 472
For a given cell we can visually examine the local weight given by the span parameter.
cellID = 10
df <- cbind(as.data.frame(colData(scHOT_spatial)),
W = slot(scHOT_spatial, "weightMatrix")[cellID,])
ggplot(df,
aes(x = x_global_affine, y = -y_global_affine)) +
geom_point(aes(colour = W, size = W)) +
scale_colour_gradient(low = "black", high = "purple") +
scale_size_continuous(range = c(0.5,2.5)) +
theme_classic() +
guides(colour = guide_legend(title = "Spatial Weight"),
size = guide_legend(title = "Spatial Weight")) +
ggtitle(paste0("Central cell: ", cellID)) +
coord_fixed() +
NULL
Question
## Make associated changes to the code to test out the question above.
We set the higher order function as the weighted mean function, and then calculate the observed higher order test statistics. This may take around 10 seconds.
scHOT_spatial <- scHOT_calculateGlobalHigherOrderFunction(
scHOT_spatial,
higherOrderFunction = weightedMean,
higherOrderFunctionType = "weighted")
slot(scHOT_spatial, "scHOT_output")
## DataFrame with 165 rows and 2 columns
## gene_1 globalHigherOrderFunction
## <character> <matrix>
## Acvr1 Acvr1 0.216666
## Acvr2a Acvr2a 0.375776
## Ahnak Ahnak 0.976418
## Akr1c19 Akr1c19 0.744070
## Aldh1a2 Aldh1a2 0.245981
## ... ... ...
## Wnt5a Wnt5a 0.335820
## Wnt5b Wnt5b 0.220300
## Xist Xist 1.162241
## Zfp444 Zfp444 0.744082
## Zfp57 Zfp57 0.595519
scHOT_spatial <- scHOT_calculateHigherOrderTestStatistics(
scHOT_spatial, na.rm = TRUE)
Now we can plot the overall mean versus the scHOT statistic to
observe any relationship. Labels can be interactively visualised using
ggplotly. Some genes may have different distributions so we
turn to permutation testing to assess statistical significance.
g <- ggplot(as.data.frame(scHOT_spatial@scHOT_output),
aes(x = globalHigherOrderFunction, y = higherOrderStatistic, label = gene_1)) +
xlab("Mean across all cells") +
ylab("scHOT statistic for local weightedMean") +
geom_point()
g
ggplotly(g)
Set up the permutation testing schema. For the purposes of this workshop we set a low number of permutations over a low number of genes in the testing scaffold, you may want to change this as you work through the workshop yourself. The testing will take a few minutes to run, here with the parallel parameters that were set at the beginning of this document.
scHOT_spatial <- scHOT_setPermutationScaffold(scHOT_spatial,
numberPermutations = 50,
numberScaffold = 30)
scHOT_spatial <- scHOT_performPermutationTest(
scHOT_spatial,
verbose = TRUE,
parallel = FALSE)
## Permutation testing combination 40 of 165...
## Permutation testing combination 50 of 165...
## Permutation testing combination 80 of 165...
## Permutation testing combination 150 of 165...
slot(scHOT_spatial, "scHOT_output")
## DataFrame with 165 rows and 9 columns
## gene_1 globalHigherOrderFunction higherOrderSequence
## <character> <matrix> <NumericList>
## Acvr1 Acvr1 0.216666 0.251205,0.275076,0.286668,...
## Acvr2a Acvr2a 0.375776 0.398236,0.376223,0.361763,...
## Ahnak Ahnak 0.976418 1.23931,1.22101,1.19278,...
## Akr1c19 Akr1c19 0.744070 0.681732,0.622183,0.625407,...
## Aldh1a2 Aldh1a2 0.245981 0.117491,0.118105,0.121221,...
## ... ... ... ...
## Wnt5a Wnt5a 0.335820 0.282418,0.280240,0.268180,...
## Wnt5b Wnt5b 0.220300 0.262440,0.321449,0.368172,...
## Xist Xist 1.162241 1.18893,1.17123,1.18238,...
## Zfp444 Zfp444 0.744082 0.529888,0.531771,0.538540,...
## Zfp57 Zfp57 0.595519 0.853046,0.844188,0.838651,...
## higherOrderStatistic numberPermutations storePermutations
## <numeric> <numeric> <logical>
## Acvr1 0.0750954 0 TRUE
## Acvr2a 0.0665143 0 TRUE
## Ahnak 0.3319897 0 TRUE
## Akr1c19 0.1673342 0 TRUE
## Aldh1a2 0.1827836 0 TRUE
## ... ... ... ...
## Wnt5a 0.172300 0 TRUE
## Wnt5b 0.104924 0 TRUE
## Xist 0.120828 0 TRUE
## Zfp444 0.118930 50 TRUE
## Zfp57 0.130128 0 TRUE
## permutations pvalPermutations FDRPermutations
## <NumericList> <numeric> <numeric>
## Acvr1 NA NA NA
## Acvr2a NA NA NA
## Ahnak NA NA NA
## Akr1c19 NA NA NA
## Aldh1a2 NA NA NA
## ... ... ... ...
## Wnt5a NA NA NA
## Wnt5b NA NA NA
## Xist NA NA NA
## Zfp444 0.0582096,0.0537059,0.0641674,... 0.04 0.044
## Zfp57 NA NA NA
After the permutation test we can estimate the P-values across all genes.
scHOT_plotPermutationDistributions(scHOT_spatial)
scHOT_spatial <- scHOT_estimatePvalues(scHOT_spatial,
nperm_estimate = 100,
maxDist = 0.1)
slot(scHOT_spatial, "scHOT_output")
## DataFrame with 165 rows and 14 columns
## gene_1 globalHigherOrderFunction higherOrderSequence
## <character> <matrix> <NumericList>
## Acvr1 Acvr1 0.216666 0.251205,0.275076,0.286668,...
## Acvr2a Acvr2a 0.375776 0.398236,0.376223,0.361763,...
## Ahnak Ahnak 0.976418 1.23931,1.22101,1.19278,...
## Akr1c19 Akr1c19 0.744070 0.681732,0.622183,0.625407,...
## Aldh1a2 Aldh1a2 0.245981 0.117491,0.118105,0.121221,...
## ... ... ... ...
## Wnt5a Wnt5a 0.335820 0.282418,0.280240,0.268180,...
## Wnt5b Wnt5b 0.220300 0.262440,0.321449,0.368172,...
## Xist Xist 1.162241 1.18893,1.17123,1.18238,...
## Zfp444 Zfp444 0.744082 0.529888,0.531771,0.538540,...
## Zfp57 Zfp57 0.595519 0.853046,0.844188,0.838651,...
## higherOrderStatistic numberPermutations storePermutations
## <numeric> <numeric> <logical>
## Acvr1 0.0750954 0 TRUE
## Acvr2a 0.0665143 0 TRUE
## Ahnak 0.3319897 0 TRUE
## Akr1c19 0.1673342 0 TRUE
## Aldh1a2 0.1827836 0 TRUE
## ... ... ... ...
## Wnt5a 0.172300 0 TRUE
## Wnt5b 0.104924 0 TRUE
## Xist 0.120828 0 TRUE
## Zfp444 0.118930 50 TRUE
## Zfp57 0.130128 0 TRUE
## permutations pvalPermutations FDRPermutations
## <NumericList> <numeric> <numeric>
## Acvr1 NA NA NA
## Acvr2a NA NA NA
## Ahnak NA NA NA
## Akr1c19 NA NA NA
## Aldh1a2 NA NA NA
## ... ... ... ...
## Wnt5a NA NA NA
## Wnt5b NA NA NA
## Xist NA NA NA
## Zfp444 0.0582096,0.0537059,0.0641674,... 0.04 0.044
## Zfp57 NA NA NA
## numberPermutationsEstimated globalLowerRangeEstimated
## <integer> <numeric>
## Acvr1 50 0.270287
## Acvr2a 50 0.412098
## Ahnak 1100 0.270287
## Akr1c19 100 0.744082
## Aldh1a2 50 0.270287
## ... ... ...
## Wnt5a 100 0.270287
## Wnt5b 50 0.270287
## Xist 100 1.163058
## Zfp444 100 0.744082
## Zfp57 100 0.508846
## globalUpperRangeEstimated pvalEstimated FDREstimated
## <numeric> <numeric> <numeric>
## Acvr1 0.270287 0.100000000 0.1231343
## Acvr2a 0.412098 0.480000000 0.5109677
## Ahnak 3.171420 0.000908265 0.0239130
## Akr1c19 0.809843 0.009900990 0.0239130
## Aldh1a2 0.270287 0.019607843 0.0282051
## ... ... ... ...
## Wnt5a 0.412098 0.00990099 0.0239130
## Wnt5b 0.270287 0.01960784 0.0282051
## Xist 1.230984 0.26000000 0.2958621
## Zfp444 0.809843 0.04000000 0.0532258
## Zfp57 0.611378 0.03000000 0.0415966
We can now examine the spatial expression of the 5 most significant genes, both in our scHOT object and over our original spe object.
output_sorted <- slot(scHOT_spatial, "scHOT_output")[order(slot(scHOT_spatial,
"scHOT_output")$pvalEstimated),]
topgenes <- rownames(output_sorted)[1:5]
reducedDim(scHOT_spatial, "spatialCoords") <- reducedDim(spe, "spatialCoords")[colnames(scHOT_spatial),]
for (topgene in topgenes) {
g_spe <- plotReducedDim(spe, "spatialCoords", colour_by = c(topgene), point_size = 1) +
coord_fixed()
g_scHOT <- plotReducedDim(scHOT_spatial, "spatialCoords", colour_by = c(topgene), point_size = 1,
by_exprs_values = "expression") +
coord_fixed()
g_all <- g_scHOT + g_spe
print(g_all)
}
Here we are noting the genes that are found to have the most statistically significant spatial variation in their local mean expression. These genes point to specific patterns that govern the development of individual parts of the gut tube.
Advanced/Extended Questions
## try some code
Now that we have assessed genes varying in expression in space, let’s look further into the IMC data where we will make use of the multiple samples to perform clustering and extract some biological understanding.
First, let’s select the markers that we’d like to use to cluster the cells. Clustering partitions the data by the largest sources of variation. If there are lots of markers for a specific cell type, this will cluster very well.
This step requires domain knowledge about how different cells express different markers. Using irrelevant markers to cluster the cells will lead to meaningless clusters.
useMarkers <- c(
"NKX6_1", # Homeobox protein Nkx-6.1 β 169Tm
"IAPP", # Amylin β 167Er
"GCG", # Glucagon α 156Gd
"PCSK2", # Proprotein convertase 2 α 144Nd
"SST", # Somatostatin δ 159Tb
"PPY", # Pancreatic polypeptide γ 153Eu
"PDX1", # Pancreatic and duodenal homeobox 1 β, δ, ductal 158Gd
"SYP", # Synaptophysin Endocrine 160Gd
"CD99", # CD99 Endocrine 145Nd
"SLC2A1", # Glucose transporter 1 Endocrine 148Nd
"PTPRN", # Receptor-type tyrosine-protein phosphatase-like N Endocrine 174Yb
"AMY2A", # Pancreatic amylase Acinar 150Nd
"KRT19", # Cytokeratin 19 Ductal 161Dy
"CD44", # CD44 Exocrine 143Nd
"CD45", # CD45 Immune 162Dy
"CD45RA", # CD45RA Immune 164Dy
"CD3e", # CD3ɛ T 152Sm
"CD4", # CD4 Helper T 171Yb
"CD8a", # CD8a Cytotoxic T 165Ho
"CD20", # CD20 B 149Sm
"CD68", # CD68 Monocytes, macrophages 146Nd
"MPO", # Myeloperoxidase Neutrophils 147Sm
"FOXP3", # Forkhead box P3 Regulatory T 163Dy
"CD38", # CD38 Immune 142Nd
"CDH1", # E-/P-cadherin Epithelial 173Yb
"CD31", # CD31 Endothelial 172Yb
"SMA", # Smooth muscle actin Stromal 115In
"Ki67", # Ki-67 Proliferating 168Er
"p_HH3", # Phospho-histone H3 Proliferating 170Er
"p_Rb", # Phospho-retinoblastoma Cycling 175Yb
"cPARP_cCASP3", # Cleaved caspase 3 + cleaved poly (ADP-ribose) polymerase Apoptotic 176Yb
"CA9" # Carbonic anhydrase IX Hypoxic 166Er
)
Clustering
Here we cluster using the runFuseSOM function. We have
chosen to specify the same subset of markers used in the original
manuscript for gating cell types. We have also specified the number of
clusters to identify to be numClusters = 13.
set.seed(51773)
imc <- runFuseSOM(imc,
markers = useMarkers,
assay = "norm",
numClusters = 13
)
Clusters Selection
We can check to see how reasonable our choice of 13 clusters is using
the estimateNumCluster and the optiPlot
functions. Here we examine the Gap method, others such as jump,
silhouette and within cluster distance are also available.
imc <- estimateNumCluster(imc, kSeq = 2:30)
optiPlot(imc, method = "gap")
imc@metadata$clusterEstimation$Discriminant
## [1] 12
Cluster Interpretation
We can begin the process of understanding what each of these cell
clusters are by using the plotGroupedHeatmap function from
scater. At the least, here, we can see we capture a few
different populations of islet cells and a few immune cell populates
(including CD4+ Tcells and CD8+ Tcells).
scater::plotGroupedHeatmap(imc,
features = useMarkers,
group = "clusters",
exprs_values = "exprs",
center = TRUE,
scale = TRUE,
zlim = c(-3, 3),
cluster_rows = FALSE,
block = "clusters" #block is needed because of a current bug in the function.
)
Damond et al. define their cell types using a stepwise clustering procedure with manual merge of clusters. We can compare our clustering to theirs.
regionMap(imc, cellType = "clusters", region = "cell_type") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
Checking Cluster Frequencies
We find it always useful to check the number of cells in each cluster.
# Check cluster frequencies.
colData(imc)$clusters |>
table() |>
sort()
##
## cluster_6 cluster_12 cluster_4 cluster_2 cluster_3 cluster_13 cluster_10
## 887 1400 1552 1914 1964 2207 2980
## cluster_11 cluster_1 cluster_7 cluster_8 cluster_5 cluster_9
## 3397 4071 4836 6279 33877 56023
Visualise cell types in an image
Look at the distribution of cells types in an image
reducedDim(imc, "spatialCoords") <- spatialCoords(imc)
plotReducedDim(imc[, imc$image_name == "E02"], "spatialCoords", colour_by = "clusters")
Questions
# try to answer the questions here!
Testing Relationships
Next, we test if there is an association between the proportion of
each cell type in our clusters and progression status of disease. To do
this, we recommend using a package such as diffcyt for
testing for changes in abundance of cell types. However, the
colTest function from spicyR allows us to
quickly test for associations between the proportions of the cell types
and disease status using either Wilcoxon rank sum tests or t-tests. Here
we see a couple of p-values less than 0.05.
# Select cells which belong to Non-diabetic individuals and those with recent onset diabetes.
cellsToUse <- imc$patient_stage %in% c("Non-diabetic", "Onset")
# Perform simple t tests on the columns of the proportion matrix.
testProp <- colTest(imc[, cellsToUse],
condition = "patient_stage",
feature = "clusters",
imageID = "image_name"
)
testProp
## mean in group Non-diabetic mean in group Onset t pval adjPval
## cluster_3 0.0360 0.00100 21.00 3.3e-12 4.3e-11
## cluster_13 0.0630 0.00024 12.00 4.7e-08 3.1e-07
## cluster_2 0.0056 0.02600 -6.80 2.1e-06 9.1e-06
## cluster_8 0.0390 0.07300 -5.90 8.9e-06 2.9e-05
## cluster_7 0.0480 0.03400 6.30 1.1e-05 2.9e-05
## cluster_6 0.0049 0.00940 -3.80 9.7e-04 2.1e-03
## cluster_11 0.0220 0.04600 -3.10 7.6e-03 1.4e-02
## cluster_5 0.2700 0.29000 -2.80 9.7e-03 1.6e-02
## cluster_4 0.0083 0.01200 -2.80 1.1e-02 1.6e-02
## cluster_10 0.0230 0.02500 -1.10 3.0e-01 3.9e-01
## cluster_12 0.0080 0.00910 -0.78 4.5e-01 5.3e-01
## cluster_1 0.0350 0.03500 0.13 9.0e-01 9.0e-01
## cluster_9 0.4400 0.44000 -0.13 9.0e-01 9.0e-01
## cluster
## cluster_3 cluster_3
## cluster_13 cluster_13
## cluster_2 cluster_2
## cluster_8 cluster_8
## cluster_7 cluster_7
## cluster_6 cluster_6
## cluster_11 cluster_11
## cluster_5 cluster_5
## cluster_4 cluster_4
## cluster_10 cluster_10
## cluster_12 cluster_12
## cluster_1 cluster_1
## cluster_9 cluster_9
Our package, spicyR, which can be found on Bioconductor,
provides a series of functions to aid in the analysis of both
immunofluorescence and mass cytometry imaging data as well as other
assays that can deeply phenotype individual cells and their spatial
location. Here, we use the spicy function to test for
changes in the spatial relationships between pairwise combinations of
cells. We quantify spatial relationships using a combination of three
radii Rs = c(20, 50, 100) and mildly account for some
global tissue structure using sigma = 50.
# Select cells which belong to Non-diabetic individuals and those with recent onset diabetes.
cellsToUse <- imc$patient_stage %in% c("Non-diabetic", "Onset")
# Test for changes in pairwise spatial relationships between cell types.
spicyTest <- spicy(imc[, cellsToUse],
condition = "patient_stage",
cellType = "clusters",
imageID = "image_name",
spatialCoords = c("cell_x", "cell_y"),
Rs = c(20, 50, 100),
sigma = 50,
BPPARAM = BPPARAM
)
topPairs(spicyTest, n = 10)
## intercept coefficient p.value adj.pvalue
## cluster_13__cluster_13 198.3903501 -332.421086 7.916473e-10 1.337884e-07
## cluster_7__cluster_9 1.9265714 -10.498368 1.383730e-08 8.171808e-07
## cluster_10__cluster_7 -34.3992900 38.738667 1.450617e-08 8.171808e-07
## cluster_7__cluster_10 -31.3792510 36.020947 2.939840e-08 1.085683e-06
## cluster_11__cluster_2 -88.9167734 102.045503 3.212079e-08 1.085683e-06
## cluster_2__cluster_11 -85.8936506 98.702754 6.829779e-08 1.923721e-06
## cluster_9__cluster_7 1.9090631 -8.942701 1.012202e-07 2.443745e-06
## cluster_11__cluster_7 -42.9063743 47.450995 2.407615e-07 5.086086e-06
## cluster_7__cluster_11 -39.8747474 44.537604 7.870787e-07 1.477959e-05
## cluster_6__cluster_9 -0.7721868 -19.414545 1.211581e-06 2.047572e-05
## from to
## cluster_13__cluster_13 cluster_13 cluster_13
## cluster_7__cluster_9 cluster_7 cluster_9
## cluster_10__cluster_7 cluster_10 cluster_7
## cluster_7__cluster_10 cluster_7 cluster_10
## cluster_11__cluster_2 cluster_11 cluster_2
## cluster_2__cluster_11 cluster_2 cluster_11
## cluster_9__cluster_7 cluster_9 cluster_7
## cluster_11__cluster_7 cluster_11 cluster_7
## cluster_7__cluster_11 cluster_7 cluster_11
## cluster_6__cluster_9 cluster_6 cluster_9
We can visualise these tests using signifPlot.
signifPlot(spicyTest,
breaks = c(-1.5, 3, 0.5),
cutoff = 0.0001
)
We can view some of the interesting relationships in images. Lets first identify images with “interesting” relationships between cluster 2 (likely T cells) with cluster 11 (likely islet cells).
# Sorry, this code could be made cleaner. This seems like it would be a useful function
# Get outcome data
df <- colData(imc)[cellsToUse, c("image_name", "patient_stage")]
df <- unique(df)
rownames(df) <- df$image_name
# Extract relationship
df$spatialRelationship <- spicyTest$pairwiseAssoc[["cluster_2__cluster_11"]]
df <- as.data.frame(df)
# Visualise relationships
p1 <- ggplot(df, aes(x = patient_stage, y = spatialRelationship, label = image_name)) +
geom_violin() + geom_jitter()
# Use plotly to look for representative images
ggplotly(p1)
We can now plot these images using plotReducedDim.
# Lets ensure that the colours of cell types stay consistent
celltypes <- unique(imc$clusters)
cellTypeColour <- brewer.pal(length(celltypes), "Paired") |>
colorRampPalette()
cellTypeColour <- cellTypeColour(length(celltypes))
names(cellTypeColour) <- celltypes
# Choose which cells to highlight
imc$highlight <- imc$clusters %in% c("cluster_2", "cluster_11")
imc$islet <- imc$clusters %in% c("cluster_10", "cluster_11", "cluster_12", "cluster_13")
# Plot two images
p2 <- plotReducedDim(imc[, imc$image_name == "G17"], "spatialCoords", colour_by = "clusters", size_by = "highlight", shape_by = "islet") + scale_colour_manual(values = cellTypeColour)
p3 <- plotReducedDim(imc[, imc$image_name == "E13"], "spatialCoords", colour_by = "clusters", size_by = "highlight", shape_by = "islet") + scale_colour_manual(values = cellTypeColour)
p2+p3
Question
# try to answer the questions here!
Our package, lisaClust, which can be found on Bioconductor,
provides a series of functions to identify and visualise regions of
tissue where spatial associations between cell-types is similar. This
package can be used to provide a high-level summary of cell-type
colocalisation in multiplexed imaging data that has been segmented at a
single-cell resolution. Here we use the lisaClust function
to clusters cells into 5 regions with distinct spatial ordering.
set.seed(51773)
# Cluster cells into spatial regions with similar composition.
imc <- lisaClust(imc,
k = 5,
Rs = c(20, 50, 100),
sigma = 50,
spatialCoords = c("cell_x", "cell_y"),
cellType = "clusters",
imageID = "image_name",
BPPARAM = BPPARAM
)
Region - cell type enrichment heatmap
We can try to interpret which spatial orderings the regions are
quantifying using the regionMap function. This plots the
frequency of each cell type in a region relative to what you would
expect by chance.
# Visualise the enrichment of each cell type in each region
regionMap(imc, cellType = "clusters", limit = c(0.2, 5))
Visualise regions
By default, these identified regions are stored in the
regions column in the colData of our object.
We can quickly examine the spatial arrangement of these regions using
ggplot.
plotReducedDim(imc[, imc$image_name == "E02"], "spatialCoords", colour_by = "region")
While much slower, we have also implemented a function for overlaying the region information as a hatching pattern so that the information can be viewed simultaneously with the cell type calls.
# Use hatching to visualise regions and cell types.
hatchingPlot(imc,
imageID = "image_name",
useImages = "E02",
cellType = "clusters",
spatialCoords = c("cell_x", "cell_y"),
line.spacing = 31
) + scale_colour_manual(values = cellTypeColour)
Visualise proportions of cells from different regions
regionProp <- getProp(imc, feature = "region", imageID = "image_name")
stage <- imc |>
colData() |>
as.data.frame() |>
dplyr::select("image_name", "patient_stage") |>
unique() |>
mutate(patient_stage = factor(patient_stage, levels = c(
"Non-diabetic",
"Onset",
"Long-duration"
)))
df <- regionProp |>
mutate(image_name = rownames(regionProp)) |>
dplyr::right_join(stage, by = "image_name") |>
tidyr::pivot_longer(
cols = starts_with("region"),
names_to = "region",
values_to = "proportion"
)
ggplot(df, aes(x = region, y = proportion, colour = patient_stage)) +
geom_boxplot()
Test for changes in proportions of the regions
# Select cells which belong to Non-diabetic individuals and those with recent onset diabetes.
cellsToUse <- imc$patient_stage %in% c("Non-diabetic", "Onset")
# Perform simple t tests on the columns of the proportion matrix.
testRegions <- colTest(imc[, cellsToUse],
condition = "patient_stage",
feature = "region",
imageID = "image_name"
)
testRegions
## mean in group Non-diabetic mean in group Onset t pval adjPval
## region_4 0.120 0.00064 14.0 1.2e-08 6.0e-08
## region_1 0.018 0.11000 -8.6 3.5e-07 8.8e-07
## region_3 0.130 0.18000 -2.8 9.6e-03 1.6e-02
## region_2 0.190 0.17000 1.0 3.2e-01 4.0e-01
## region_5 0.540 0.53000 0.5 6.2e-01 6.2e-01
## cluster
## region_4 region_4
## region_1 region_1
## region_3 region_3
## region_2 region_2
## region_5 region_5
Question
Our ClassifyR package, https://github.com/SydneyBioX/ClassifyR, formalises a
convenient framework for evaluating classification in R. We provide
functionality to easily include four key modelling stages; Data
transformation, feature selection, classifier training and prediction;
into a cross-validation loop. Here we use the crossValidate
function to perform 5 repeats of 3-fold cross-validation (we typically
use 100 repeats of 5-fold cross-validation) to evaluate the performance
of an randomForest applied to three quantification of our
IMC data; cell type proportions, average pairwise distances between
cell-types and region proportions.
In particular, we are evaluating the effectiveness of classifying islets from people with recent onset versus long-duration diabetes.
# Create list to store data.frames
data <- list()
# Add proportions of each cell type in each image
data[["props"]] <- getProp(imc, feature = "clusters", imageID = "image_name")
# Add pairwise associations
data[["dist"]] <- getPairwise(imc,
spatialCoords = c("cell_x", "cell_y"),
imageID = "image_name",
cellType = "clusters",
Rs = c(20, 50, 100),
sigma = 50,
BPPARAM = BPPARAM
)
# Add proportions of each region in each image
# to the list of dataframes.
data[["regions"]] <- getProp(imc, feature = "region", imageID = "image_name")
# Get outcome data
df <- colData(imc)[, c("image_name", "patient_stage")]
df <- unique(df)
rownames(df) <- df$image_name
outcome <- df$patient_stage
names(outcome) <- df$image_name
# Only use onset vs long-duration
outcome <- outcome[outcome %in% c("Long-duration", "Onset")]
outcome <- factor(outcome)
measurements <- lapply(data, function(x) x[names(outcome), ])
# Set seed
set.seed(51773)
# Perform cross-validation of a randomForest model
# for timing reasons we select a scheme
# with 10 repeats of 3-fold cross-validation.
cv <- crossValidate(
measurements = measurements,
outcome = outcome,
classifier = "randomForest",
nFolds = 3,
nRepeats = 5,
nCores = nCores
)
Visualise cross-validated prediction performance
Here, we use the performancePlot function to assess the
AUC from each repeat of the 3-fold cross-validation. We see that the
lisaClust regions appear to capture the least amount of information that
is predictive of diabetes progression status of the patients.
# Calculate AUC for each cross-validation repeat and plot.
performancePlot(cv,
metric = "AUC",
characteristicsList = list(x = "Assay Name")
)
Questions
Could we use different classifiers?
What is the balanced accuracy? Why would we select this measure instead?
# Try the code!
This workshop has shown how to explore some processed spatial transcriptomic and IMC datasets, as well as perform analytical techniques made available in the scdney package. We hope you enjoyed the session and will continue using some of the ideas and tools we showed you for your own research questions. Reach out to the team either directly or via the Github.
R version: R version 4.3.1 (2023-06-16 ucrt)
Bioconductor version: 3.17
sessionInfo()
## R version 4.3.1 (2023-06-16 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 11 x64 (build 22621)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_Australia.utf8 LC_CTYPE=English_Australia.utf8
## [3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C
## [5] LC_TIME=English_Australia.utf8
##
## time zone: Australia/Sydney
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RColorBrewer_1.1-3 BumpyMatrix_1.8.0
## [3] plotly_4.10.2 patchwork_1.1.2
## [5] batchelor_1.16.0 scater_1.28.0
## [7] scuttle_1.10.1 ggplot2_3.4.2
## [9] ClassifyR_3.4.6 survival_3.5-5
## [11] BiocParallel_1.34.2 MultiAssayExperiment_1.26.0
## [13] generics_0.1.3 lisaClust_1.8.0
## [15] spicyR_1.12.0 FuseSOM_1.2.0
## [17] simpleSeg_1.2.0 scHOT_1.12.0
## [19] imcdatasets_1.8.0 cytomapper_1.12.0
## [21] EBImage_4.42.0 STexampleData_1.8.0
## [23] SpatialExperiment_1.10.0 SingleCellExperiment_1.22.0
## [25] SummarizedExperiment_1.30.2 Biobase_2.60.0
## [27] GenomicRanges_1.52.0 GenomeInfoDb_1.36.1
## [29] IRanges_2.34.0 S4Vectors_0.38.1
## [31] MatrixGenerics_1.12.2 matrixStats_1.0.0
## [33] ExperimentHub_2.8.1 AnnotationHub_3.8.0
## [35] BiocFileCache_2.8.0 dbplyr_2.3.3
## [37] BiocGenerics_0.46.0
##
## loaded via a namespace (and not attached):
## [1] spatstat.sparse_3.0-1 bitops_1.0-7
## [3] httr_1.4.6 DataVisualizations_1.3.0
## [5] numDeriv_2016.8-1.1 tools_4.3.1
## [7] backports_1.4.1 ResidualMatrix_1.10.0
## [9] utf8_1.2.3 R6_2.5.1
## [11] vegan_2.6-4 HDF5Array_1.28.1
## [13] lazyeval_0.2.2 mgcv_1.8-42
## [15] rhdf5filters_1.12.1 permute_0.9-7
## [17] withr_2.5.0 sp_2.0-0
## [19] analogue_0.17-6 gridExtra_2.3
## [21] cli_3.6.1 spatstat.explore_3.2-1
## [23] profileModel_0.6.1 labeling_0.4.2
## [25] sass_0.4.6 brglm_0.7.2
## [27] nnls_1.4 spatstat.data_3.0-1
## [29] genefilter_1.82.1 proxy_0.4-27
## [31] systemfonts_1.0.4 yulab.utils_0.0.6
## [33] svglite_2.1.1 R.utils_2.12.2
## [35] limma_3.56.2 rstudioapi_0.15.0
## [37] RSQLite_2.3.1 gridGraphics_0.5-1
## [39] spatstat.random_3.1-5 car_3.1-2
## [41] dplyr_1.1.2 scam_1.2-14
## [43] Matrix_1.6-0 ggbeeswarm_0.7.2
## [45] fansi_1.0.4 abind_1.4-5
## [47] R.methodsS3_1.8.2 terra_1.7-39
## [49] lifecycle_1.0.3 yaml_2.3.7
## [51] edgeR_3.42.4 carData_3.0-5
## [53] rhdf5_2.44.0 grid_4.3.1
## [55] blob_1.2.4 promises_1.2.0.1
## [57] dqrng_0.3.0 crayon_1.5.2
## [59] shinydashboard_0.7.2 lattice_0.21-8
## [61] beachmat_2.16.0 annotate_1.78.0
## [63] KEGGREST_1.40.0 magick_2.7.4
## [65] pillar_1.9.0 knitr_1.43
## [67] boot_1.3-28.1 rjson_0.2.21
## [69] codetools_0.2-19 glue_1.6.2
## [71] FCPS_1.3.3 data.table_1.14.8
## [73] vctrs_0.6.3 png_0.1-8
## [75] gtable_0.3.3 cachem_1.0.8
## [77] xfun_0.39 princurve_2.1.6
## [79] S4Arrays_1.0.4 mime_0.12
## [81] DropletUtils_1.20.0 coop_0.6-3
## [83] pheatmap_1.0.12 interactiveDisplayBase_1.38.0
## [85] ellipsis_0.3.2 nlme_3.1-162
## [87] bit64_4.0.5 filelock_1.0.2
## [89] bslib_0.5.0 irlba_2.3.5.1
## [91] svgPanZoom_0.3.4 vipor_0.4.5
## [93] colorspace_2.1-0 DBI_1.1.3
## [95] raster_3.6-23 mnormt_2.1.1
## [97] tidyselect_1.2.0 bit_4.0.5
## [99] compiler_4.3.1 curl_5.0.1
## [101] BiocNeighbors_1.18.0 DelayedArray_0.26.3
## [103] scales_1.2.1 psych_2.3.6
## [105] rappdirs_0.3.3 goftest_1.2-3
## [107] tiff_0.1-11 stringr_1.5.0
## [109] digest_0.6.31 minqa_1.2.5
## [111] fftwtools_0.9-11 spatstat.utils_3.0-3
## [113] rmarkdown_2.23 XVector_0.40.0
## [115] htmltools_0.5.5 pkgconfig_2.0.3
## [117] jpeg_0.1-10 lme4_1.1-33
## [119] sparseMatrixStats_1.12.0 highr_0.10
## [121] fastmap_1.1.1 rlang_1.1.1
## [123] htmlwidgets_1.6.2 shiny_1.7.4.1
## [125] DelayedMatrixStats_1.22.1 farver_2.1.1
## [127] jquerylib_0.1.4 jsonlite_1.8.5
## [129] mclust_6.0.0 R.oo_1.25.0
## [131] BiocSingular_1.16.0 RCurl_1.98-1.12
## [133] magrittr_2.0.3 GenomeInfoDbData_1.2.10
## [135] ggplotify_0.1.1 Rhdf5lib_1.22.0
## [137] munsell_0.5.0 Rcpp_1.0.10
## [139] viridis_0.6.3 stringi_1.7.12
## [141] zlibbioc_1.46.0 MASS_7.3-60
## [143] plyr_1.8.8 parallel_4.3.1
## [145] ggrepel_0.9.3 deldir_1.0-9
## [147] Biostrings_2.68.1 splines_4.3.1
## [149] tensor_1.5 locfit_1.5-9.8
## [151] ranger_0.15.1 igraph_1.5.0
## [153] ggpubr_0.6.0 spatstat.geom_3.2-1
## [155] ggsignif_0.6.4 reshape2_1.4.4
## [157] ScaledMatrix_1.8.1 XML_3.99-0.14
## [159] BiocVersion_3.17.1 evaluate_0.21
## [161] BiocManager_1.30.21 nloptr_2.0.3
## [163] tweenr_2.0.2 httpuv_1.6.11
## [165] tidyr_1.3.0 purrr_1.0.1
## [167] polyclip_1.10-4 reshape_0.8.9
## [169] ggforce_0.4.1 rsvd_1.0.5
## [171] broom_1.0.5 xtable_1.8-4
## [173] rstatix_0.7.2 later_1.3.1
## [175] class_7.3-22 viridisLite_0.4.2
## [177] tibble_3.2.1 lmerTest_3.1-3
## [179] memoise_2.0.1 beeswarm_0.4.0
## [181] AnnotationDbi_1.62.2 cluster_2.1.4
## [183] concaveman_1.1.0